16 research outputs found
One-shot Detail Retouching with Patch Space Neural Field based Transformation Blending
Photo retouching is a difficult task for novice users as it requires expert
knowledge and advanced tools. Photographers often spend a great deal of time
generating high-quality retouched photos with intricate details. In this paper,
we introduce a one-shot learning based technique to automatically retouch
details of an input image based on just a single pair of before and after
example images. Our approach provides accurate and generalizable detail edit
transfer to new images. We achieve these by proposing a new representation for
image to image maps. Specifically, we propose neural field based transformation
blending in the patch space for defining patch to patch transformations for
each frequency band. This parametrization of the map with anchor
transformations and associated weights, and spatio-spectral localized patches,
allows us to capture details well while staying generalizable. We evaluate our
technique both on known ground truth filtes and artist retouching edits. Our
method accurately transfers complex detail retouching edits
ARF-Plus: Controlling Perceptual Factors in Artistic Radiance Fields for 3D Scene Stylization
The radiance fields style transfer is an emerging field that has recently
gained popularity as a means of 3D scene stylization, thanks to the outstanding
performance of neural radiance fields in 3D reconstruction and view synthesis.
We highlight a research gap in radiance fields style transfer, the lack of
sufficient perceptual controllability, motivated by the existing concept in the
2D image style transfer. In this paper, we present ARF-Plus, a 3D neural style
transfer framework offering manageable control over perceptual factors, to
systematically explore the perceptual controllability in 3D scene stylization.
Four distinct types of controls - color preservation control, (style pattern)
scale control, spatial (selective stylization area) control, and depth
enhancement control - are proposed and integrated into this framework. Results
from real-world datasets, both quantitative and qualitative, show that the four
types of controls in our ARF-Plus framework successfully accomplish their
corresponding perceptual controls when stylizing 3D scenes. These techniques
work well for individual style inputs as well as for the simultaneous
application of multiple styles within a scene. This unlocks a realm of
limitless possibilities, allowing customized modifications of stylization
effects and flexible merging of the strengths of different styles, ultimately
enabling the creation of novel and eye-catching stylistic effects on 3D scenes
DNeRF: Self-Supervised Decoupling of Dynamic and Static Objects from a Monocular Video
Given a monocular video, segmenting and decoupling dynamic objects while
recovering the static environment is a widely studied problem in machine
intelligence. Existing solutions usually approach this problem in the image
domain, limiting their performance and understanding of the environment. We
introduce Decoupled Dynamic Neural Radiance Field (DNeRF), a
self-supervised approach that takes a monocular video and learns a 3D scene
representation which decouples moving objects, including their shadows, from
the static background. Our method represents the moving objects and the static
background by two separate neural radiance fields with only one allowing for
temporal changes. A naive implementation of this approach leads to the dynamic
component taking over the static one as the representation of the former is
inherently more general and prone to overfitting. To this end, we propose a
novel loss to promote correct separation of phenomena. We further propose a
shadow field network to detect and decouple dynamically moving shadows. We
introduce a new dataset containing various dynamic objects and shadows and
demonstrate that our method can achieve better performance than
state-of-the-art approaches in decoupling dynamic and static 3D objects,
occlusion and shadow removal, and image segmentation for moving objects
Feature Preserving Point Set Surfaces based on Non-Linear Kernel Regression
International audienceMoving least squares (MLS) is a very attractive tool to design effective meshless surface representations. However, as long as approximations are performed in a least square sense, the resulting definitions remain sensitive to outliers, and smooth-out small or sharp features. In this paper, we address these major issues, and present a novel point based surface definition combining the simplicity of implicit MLS surfaces [SOS04,Kol05] with the strength of robust statistics. To reach this new definition, we review MLS surfaces in terms of local kernel regression, opening the doors to a vast and well established literature from which we utilize robust kernel regression. Our novel representation can handle sparse sampling, generates a continuous surface better preserving fine details, and can naturally handle any kind of sharp features with controllable sharpness. Finally, it combines ease of implementation with performance competing with other non-robust approaches
Perceptual Quality Assessment of NeRF and Neural View Synthesis Methods for Front-Facing Views
Neural view synthesis (NVS) is one of the most successful techniques for
synthesizing free viewpoint videos, capable of achieving high fidelity from
only a sparse set of captured images. This success has led to many variants of
the techniques, each evaluated on a set of test views typically using image
quality metrics such as PSNR, SSIM, or LPIPS. There has been a lack of research
on how NVS methods perform with respect to perceived video quality. We present
the first study on perceptual evaluation of NVS and NeRF variants. For this
study, we collected two datasets of scenes captured in a controlled lab
environment as well as in-the-wild. In contrast to existing datasets, these
scenes come with reference video sequences, allowing us to test for temporal
artifacts and subtle distortions that are easily overlooked when viewing only
static images. We measured the quality of videos synthesized by several NVS
methods in a well-controlled perceptual quality assessment experiment as well
as with many existing state-of-the-art image/video quality metrics. We present
a detailed analysis of the results and recommendations for dataset and metric
selection for NVS evaluation
Recommended from our members
Human Shape from Silhouettes Using Generative HKS Descriptors and Cross-Modal Neural Networks
In this work, we present a novel method for capturing
human body shape from a single scaled silhouette. We combine deep correlated features capturing different 2D views,
and embedding spaces based on 3D cues in a novel convolutional neural network (CNN) based architecture. We
first train a CNN to find a richer body shape representation space from pose invariant 3D human shape descriptors.
Then, we learn a mapping from silhouettes to this representation space, with the help of a novel architecture that exploits the correlation of multi-view data during training time, to
improve prediction at test time. We extensively validate our
results on synthetic and real data, demonstrating significant
improvements in accuracy as compared to the state-of-the-art, and providing a practical system for detailed human
body measurements from a single image
Neural Fields with Hard Constraints of Arbitrary Differential Order
While deep learning techniques have become extremely popular for solving a
broad range of optimization problems, methods to enforce hard constraints
during optimization, particularly on deep neural networks, remain
underdeveloped. Inspired by the rich literature on meshless interpolation and
its extension to spectral collocation methods in scientific computing, we
develop a series of approaches for enforcing hard constraints on neural fields,
which we refer to as Constrained Neural Fields (CNF). The constraints can be
specified as a linear operator applied to the neural field and its derivatives.
We also design specific model representations and training strategies for
problems where standard models may encounter difficulties, such as conditioning
of the system, memory consumption, and capacity of the network when being
constrained. Our approaches are demonstrated in a wide range of real-world
applications. Additionally, we develop a framework that enables highly
efficient model and constraint specification, which can be readily applied to
any downstream task where hard constraints need to be explicitly satisfied
during optimization.Comment: 37th Conference on Neural Information Processing Systems (NeurIPS
2023
Controllable Shadow Generation Using Pixel Height Maps
Shadows are essential for realistic image compositing. Physics-based shadow
rendering methods require 3D geometries, which are not always available. Deep
learning-based shadow synthesis methods learn a mapping from the light
information to an object's shadow without explicitly modeling the shadow
geometry. Still, they lack control and are prone to visual artifacts. We
introduce pixel heigh, a novel geometry representation that encodes the
correlations between objects, ground, and camera pose. The pixel height can be
calculated from 3D geometries, manually annotated on 2D images, and can also be
predicted from a single-view RGB image by a supervised approach. It can be used
to calculate hard shadows in a 2D image based on the projective geometry,
providing precise control of the shadows' direction and shape. Furthermore, we
propose a data-driven soft shadow generator to apply softness to a hard shadow
based on a softness input parameter. Qualitative and quantitative evaluations
demonstrate that the proposed pixel height significantly improves the quality
of the shadow generation while allowing for controllability.Comment: 15 pages, 11 figure
3D GAN Inversion with Facial Symmetry Prior
Recently, a surge of high-quality 3D-aware GANs have been proposed, which
leverage the generative power of neural rendering. It is natural to associate
3D GANs with GAN inversion methods to project a real image into the generator's
latent space, allowing free-view consistent synthesis and editing, referred as
3D GAN inversion. Although with the facial prior preserved in pre-trained 3D
GANs, reconstructing a 3D portrait with only one monocular image is still an
ill-pose problem. The straightforward application of 2D GAN inversion methods
focuses on texture similarity only while ignoring the correctness of 3D
geometry shapes. It may raise geometry collapse effects, especially when
reconstructing a side face under an extreme pose. Besides, the synthetic
results in novel views are prone to be blurry. In this work, we propose a novel
method to promote 3D GAN inversion by introducing facial symmetry prior. We
design a pipeline and constraints to make full use of the pseudo auxiliary view
obtained via image flipping, which helps obtain a robust and reasonable
geometry shape during the inversion process. To enhance texture fidelity in
unobserved viewpoints, pseudo labels from depth-guided 3D warping can provide
extra supervision. We design constraints aimed at filtering out conflict areas
for optimization in asymmetric situations. Comprehensive quantitative and
qualitative evaluations on image reconstruction and editing demonstrate the
superiority of our method.Comment: Project Page is at https://feiiyin.github.io/SPI